100 research outputs found

    Bandit Online Learning in Pseudo-Monotone Games with Multi-Point Pseudo-Gradient Estimate

    Full text link
    Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to drug delivery. Although most existing solution algorithms assume the availability of first-order information or full knowledge of the objectives and others' action profiles, there are situations where the only accessible information at players' disposal is the realized objective function values. In this paper, we devise a bandit online learning algorithm that integrates the optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We further demonstrate that the generated actual sequence of play can converge a.s. to a critical point if the game under study is merely coherent, without resorting to extra Tikhonov regularization terms or additional norm conditions. Finally, we illustrate the validity of the proposed algorithm via a Rock-Paper-Scissors game and a least square estimation game

    A Bandit Learning Method for Continuous Games under Feedback Delays with Residual Pseudo-Gradient Estimate

    Full text link
    Learning in multi-player games can model a large variety of practical scenarios, where each player seeks to optimize its own local objective function, which at the same time relies on the actions taken by others. Motivated by the frequent absence of first-order information such as partial gradients in solving local optimization problems and the prevalence of asynchronicity and feedback delays in multi-agent systems, we introduce a bandit learning algorithm, which integrates mirror descent, residual pseudo-gradient estimates, and the priority-based feedback utilization strategy, to contend with these challenges. We establish that for pseudo-monotone plus games, the actual sequences of play generated by the proposed algorithm converge a.s. to critical points. Compared with the existing method, the proposed algorithm yields more consistent estimates with less variation and allows for more aggressive choices of parameters. Finally, we illustrate the validity of the proposed algorithm through a thermal load management problem of building complexes

    A Study of the Duality between Kalman Filters and LQR Problems

    Get PDF
    The goal of this paper is to study a connection between the finite-horizon Kalman filtering and the LQR problems for discrete-time LTI systems. Motivated from the recent duality results on the LQR problem, a Lagrangian dual relation is used to prove that the Kalman filtering problem is a Lagrange dual problem of the LQR problem

    A Semidefinite Programming Formulation of the LQR Problem and Its Dual

    Get PDF
    The goal of this paper is to derive a modified formulation of the finite-horizon LQR problem, which can be cast as semidefinite programming problems (SDPs). In addition, based on the the Lagrangian duality, its dual problem is studied. We establish connections between the proposed primal-dual conditions with existing results. As an application of the proposed results, the decentralized LQR analysis and design problems are addressed. Especially, using the structure of the derived LQR formulations, a sufficient but simple and convex surrogate problem is developed for solving decentralized LQR design problems

    Stabilizing Switched Linear Systems under Adversarial Switching

    Get PDF
    The problem of stabilizing discrete-time switched linear control systems using continuous input by the user and against adversarial switching by an adversary is studied. It is assumed that the adversary has the advantage in that at each time it knows the user\u27s decision on the continuous control input but not vice versa. Stabilizability conditions and bounds on the fastest stabilizing rates are derived. Examples are given to illustrate the results
    corecore